PageRank algorithm, fully explained

Published in

Towards Data Science

6 min readDec 19, 2020

The PageRank algorithm or Google algorithm was introduced by Lary Page, one of the founders of Google. It was first used to rank web pages in the Google search engine. Nowadays, it is more and more used in many different fields, for example in ranking users in social media etc… What is fascinating with the PageRank algorithm is how to start from a complex problem and end up with a very simple solution. In this post, I will teach you the idea and theory behind the PageRank algorithm. You just need to have some basics in algebra and Markov Chains. Here, we will use ranking web pages as a use case to illustrate the PageRank algorithm.

Random Walk

The web can be represented like a directed graph where nodes represent the web pages and edges form links between them. Typically, if a node (web page) i is linked to a node j, it means that i refers to j.

We have to define what is the importance of a web page. As a first approach, we could say that it is the total number of web pages that refer to it. If we stop to this criteria, the importance of these web pages that refer to it is not taken into account. In other words, an important web page and a less important one has the same weight. Another approach is to assume that a web page spread its importance equally to all web pages it links to. By doing that, we can then define the score of a node j as follows:

where rᵢ is the score of the node i and dᵢ its out-degree.

From the example above, we can write this linear system:

By passing the right-hand side of this linear system into the left-hand side, we get a new linear system that we can solve by using Gaussian elimination. But this solution is limited for small graphs. Indeed, as this kind of graphs are sparse and Gauss elimination modifies the matrix when performing its operations, we lose the sparsity of the matrix and it would take more memory space. In the worst case, the matrix can no longer be stored.

Markov Chain and PageRank

Since a Markov Chain is defined by an initial distribution and a transition matrix, the above graph can be seen as a Markov chain with the following transition matrix:

Transition matrix corresponding to our example

We notice that P transpose is row stochastic which is a condition to apply Markov chain theorems.
For the initial distribution, let’s consider that it is equal to :

where n is the total number of nodes. This means that the random walker will choose randomly the initial node from where it can reach all other nodes.

At every step, the random walker will jump to another node according to the transition matrix. the probability distribution is then computed for every step. This distribution tells us where the random walker is likely to be after a certain number of steps. The probability distribution is computed using the following equation:

A stationary distribution of a Markov chain is a probability distribution π with π = Pπ. This means that the distribution will not change after one step. It is important to note that not all Markov chains admit a stationary distribution.

If a Markov chain is strongly connected, which means that any node can be reached from any other node, then it admits a stationary distribution. It is the case in our problem. So, after an infinitely long walk, we know that the probability distribution will converge to a stationary distribution π.

All we have to do is solving this equation:

We notice that π is an eigenvector of the matrix P with the eigenvalue 1. Instead of computing all eigenvectors of P and select the one which corresponds to the eigenvalue 1, we use the Frobenius-Perron theorem.

According to Frobenius-Perron theorem, if a matrix A is a square and positive matrix (all its entries are positive), then it has a positive eigenvalue r, such as |λ| < r, where λ is an eigenvalue of A. The eigenvector v of A with eigenvalue r is positive and is the unique positive eigenvector.

In our case, the matrix P is positive and square. The stationary distribution π is necessarily positive because it is a probability distribution. We conclude that π is the dominant eigenvector of P with the dominant eigenvalue 1.

To compute π, we use the power method iteration which is an iterative method to compute the dominant eigenvector of a given matrix A. From an initial approximation of the dominant eigenvector b that can be initialized randomly, the algorithm will update it until convergence using the following algorithm:

As mentioned before, the probability distribution at time t defines the probability that the walker will be in a node after t steps. It means that the higher the probability, the more important is the node. We can then rank our web pages according to the stationary distribution we get using the power method.

Teleportation and damping factor

In the web graph, for example, we can find a web page i which refers only to web page j and j refers only to i. This is what we call spider trap problem. We can also find a web page which has no outlink. It is commonly named Dead end.

In the case of a spider trap, when the random walker reaches the node 1 in the above example, he can only jump to node 2 and from node 2, he can only reach node 1, and so on. The importance of all other nodes will be taken by nodes 1 and 2. In the above example, the probability distribution will converge to π = (0, 0.5, 0.5, 0). This is not the desired result.

In the case of Dead ends, when the walker arrives at node 2, it can’t reach any other node because it has no outlink. The algorithm cannot converge.

To get over these two problems, we introduce the notion of teleportation.

Teleportation consists of connecting each node of the graph to all other nodes. The graph will be then complete. The idea is with a certain probability β, the random walker will jump to another node according to the transition matrix P and with a probability (1-β)/n, it will jump randomly to any node in the graph. We get then the new transition matrix R:

where v is a vector of ones, and e a vector of 1/n

β is commonly defined as the damping factor. In practice, it is advised to set β to 0.85.

By applying teleportation in our example, we get the following new transition matrix:

The matrix R has the same properties than P which means that it admits a stationary distribution, so we can use all the theorems we saw previously.

That’s it for the PageRank algorithm. I hope you understood the intuition and the theory behind the PageRank algorithm. Please, do not hesitate to leave comments or share my work.

Thank you!

PageRank algorithm, fully explained

Random Walk

Markov Chain and PageRank

Teleportation and damping factor

Written by Amrani Amine